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Abstract: We extend the multi-purpose Monte-Carlo event generator SHERPA to include 
processes in deeply inelastic lepton-nucleon scattering. Hadronic final states in 
00 ' this kinematical setting are characterised by the presence of multiple kinematical 

scales, which were up to now accounted for only by specific resummations in 
individual kinematical regions. Using an extension of the recently introduced 
' c"j , method for merging truncated parton showers with higher-order tree-level matrix 

O elements, it is possible to obtain predictions which are reliable in all kinematical 

limits. Different hadronic final states, defined by jets or individual hadrons, in 
' deep-inelastic scattering are analysed and the corresponding results are compared 

r"| , to HERA data. The various sources of theoretical uncertainties of the approach 

are discussed and quantified. The extension to deeply inelastic processes provides 
the opportunity to validate the merging of matrix elements and parton showers in 
■ multi-scale kinematics inaccessible in other collider environments. It also allows 

I/"*) , to use HERA data on hadronic final states in the tuning of hadronisation models. 
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1 Introduction 

Deep-inelastic lepton-nucleon scattering (DIS) allows to analyse the structure of the nucleon by means of 
a pointlike probe, and provides an experimental framework for a multitude of studies of strong interaction 
dynamics. The kinematical situation of deeply inelastic events offers access to configurations which cannot 
be probed at other colliders. The (space-like) virtuality of the exchanged photon sets a hard scale for 
the scattering process, which is also the unique hard scale in inclusive deep-inelastic structure functions. 
Studying more exclusive properties of the hadronic final state allows access to multiple other scales, given 
for example by the transverse momenta of final-state jets. Such multiple-scale configurations are impossible 
to realise in e + e~ annihilation, where the centre-of-mass energy is the only hard scale of the process, and 
difficult to access in purely hadronic collisions (vector-boson plus multi-jet production being an example of 
such a multi-scale configuration). Hadronic final states in DIS thus offer the unique opportunity to study 
multi-scale processes in QCD and provide the advantage of a relatively clean experimental setting. A wealth 
of corresponding experimental data is available from the HERA collider experiments HI and ZEUS. 

The kinematical situation of a deeply inelastic scattering process with incoming proton momentum p and 
incoming and outgoing electron momenta k and k' is characterised by the virtuality Q 2 of the exchanged 
boson, carrying momentum q = k — k' , and by the Bj0rken variable x, which can be inferred purely from 



the outgoing electron momentum as 



Q 2 = -q 2 = (k- k' f and x =^— (1) 

2 q ■ P 

Measurements are usually either performed in the Breit frame, or in the centre-of-mass frame of proton and 
virtual photon, called the hadronic centre-of-mass framcQ The centre-of-mass energy squared is then given 
by W 2 — Q 2 (l — x)/x. In lepton-hadron collisions, one generally distinguishes photoproduction processes, 
where the exchange photon is quasi-real, Q 2 — ► with W 2 fixed, and deeply inelastic processes. This 
distinction is made experimentally by imposing a minimum cut on Q 2 , typically of the order a few GeV 2 . 

Inclusive structure functions as the basic quantities in deep-inelastic processes depend on x and Q 2 only, and 
the description of the proton structure in terms of parton distributions is formulated in the space of these 
variables. The evolution of the parton distributions with increasing Q 2 is determined by the Dokshitzer- 
Gribov-Lipatov-Altarelli-Parisi (DGLAP) equations pQ, which are known to next-to-next-to- leading order [5] 
in QCD. These equations form the basis of the QCD-improved parton model, and allow for a determination of 
the process-independent parton distribution functions from global fits ^3,>4 S ,5}[5] to data from lepton-hadron 
and hadron-hadron collisions. Higher-order corrections to the DGLAP equations [2j contain powers of 
logarithms in x or (1 — x), which potentially spoil the convergence of the perturbative expansion at large and 
small x. In both limits, resummation formalisms for the large logarithmic corrections are available: threshold 
resummation at large x and BFKL resummation [7] at small x. A unified DGLAP/BFKL resummation is 
provided by the CCFM equation Jlj. The currently available inclusive structure function data can however 
be described entirely by the DGLAP framework. Considerable experimental and theoretical effort was made 
especially in order to establish observables sensitive to BFKL effects. In this context, specific hadronic final 
states, such as forward jets [U] are usually investigated and jet rapidity correlations appear to be promising 
observables. 

Among hadronic final states, jet cross sections offer the most direct probes of parton-level dynamics. The HI 
and ZEUS experiments have performed many different measurements of jet production processes, ranging 
from single-jet-inclusive and di-jet (which is often called (2 + l)-jet because of the extra proton remnant 
jet) cross sections to multi-jet cross sections and jet correlations. While the former are used for precision 
determinations of the strong coupling and of parton distributions, the latter offer detailed insight into the 
production dynamics, and can highlight the kinematical limitations of the DGLAP framework. In this 
framework, next-to-leading order (NLO) QCD predictions are available for single-jet-inclusive and di-jet 
cross sections jTU] and for three-jet production [TT]. For central jet production, and provided that Q 2 is not 
much smaller than the transverse energies of the jets in the Breit frame, Et,b, these calculations yield a very 
good description of the experimental data [T2] . In the situation of Q 2 <C E\ B , large logarithmic corrections 
of the form \n(Q /Ej, B ) appear to all orders in perturbation theory. By attributing a parton content to 
the virtual photon entering the hard process [13] , these can be resummed. Including the contributions from 
the virtual photon structure into the NLO QCD calculations [Hj extends the kinematical range where those 
are applicable (including the description of forward jet production [15|L and allows for a smooth transition 
from deep-inelastic to photoproduction processes. 

Within the framework of Monte-Carlo event generation, multi-jet production is usually described through 
parton showers, starting from a leading-order process. Various parton-shower models exist, which are either 
based on DGLAP evolution [IMI3IIMI9] , or CCFM evolution [20]. Other methods use the colour dipole 
model [ST] or an approach based on the Catani-Seymour subtraction technique 22.23J. All of those models 
have in common, that they are capable of describing parton-level final states in certain regions of the phase 
space only, which are defined by the respective resummation prescription. By construction, none of them 
allows to correctly account for multi-parton correlations, and therefore their predictions should be corrected 
using higher-order matrix elements. Studies of electroweak gauge boson production [24] and the production 
of coloured heavy states [25] indicate that an improved description of high transverse momentum jets in the 
DGLAP framework can be obtained by an appropriate combination of tree-level matrix elements and parton 
showers [26,27]. Such merged calculations are reliable in most kinematical limits. They can be thought of 
as unifying the leading logarithmic expressions of the different resummation prescriptions (DGLAP, BFKL, 
virtual photon structure) in a single calculation. Recently, new and powerful techniques have been developed, 
which extend those methods [28U29] . Their advantage is that parton shower radiation patterns are recovered 

1 The hadronic centre-of-mass frame is defined by q + p = 0, while in the Breit frame 2xp + q = 0. The two frames are 
related by a longitudinal boost. 
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Figure 1: Schematic view of the scattering kinematics in the Breit frame for leading- 
order e ± q — > e ± q scattering and 2-jet production processes in DIS. The 
lightly shaded blob denotes the incoming proton. For 2-jet events with 
large jet transverse energy, E\ B > Q 2 , the 2 — » 2 process depicted by 
the dark shaded blob in Fig. (b) sets the hard scale. 



to the accuracy provided by the shower model and therefore any statements about the logarithmic accuracy 
of the parton shower also holds for a merged calculation in this scheme. These novel techniques have so far 
been used in two relevant cases, namely e + e~ annihilation into hadrons and Drell-Yan like production of 
electroweak gauge bosons. The technical prerequisites for realising the approach in [28| were implemented 
into the multi-purpose Monte-Carlo event generator SHERPA 30 in full generality. Since hadronic final 
states in deep-inelastic scattering depend on multiple kinematical scales, related observables provide an 
independent and particularly sensitive test of the quality of the approach and its implementation. It is the 
aim of this work to present a first study of this class of processes with SHERPA, and to confront results with 
data from the HERA experiments. Moreover we propose an extension of the merging algorithm in [2"8] . 
which accounts for the proper simulation of low-Q 2 , high-E^^ events. 

Hadronic final states also offer insight into strong interaction dynamics at lower scales. Using kinematical 
spectra of identified hadrons, it is possible to probe the parton-to-hadron transition (hadronisation) , which 
can not be computed from first principles, but is usually described using semi-empirical models [311,1321 
[33]. These models are typically tuned to data from e + e~ collider experiments and the obtained fits are 
assumed to be universal. Including data from deep-inelastic processes instead allows to probe different flavour 
combinations and beam remnant fragmentation, and to resolve parameter degeneracies. It is therefore vital 
to have means for simulating partonic final states in deep-inelastic processes reliably, not only to describe jet 
spectra, but also to reduce uncertainties in fragmentation models, which can then be tuned to experimental 
data in a combined fit. Therefore we also provide some examples for the influence of the parton-level inputs 
on different fragmentation models. 

The outline of this paper is as follows. Section [2] introduces the technical details of the Monte Carlo 
simulation, including the proposal of an improved merging technique for low-Q 2 events. Section [3J presents 
the results of our analysis and discusses theoretical uncertainties. Finally Sec. Q] contains some concluding 
remarks. 



2 Event generation techniques 

The most striking difference between deep-inelastic scattering and processes like Drell-Yan lepton pair pro- 
duction is the nearly arbitrary hard scale Q 2 , at which the proton structure can be probed by the virtual 
photon. While this presents an excellent opportunity for measuring the QCD dynamics of the process, it also 
constitutes the main obstacle for simulating it with Monte Carlo techniques. The reason for these problems 
and the solution adopted in the context of this work are outlined below. 



2.1 Parton shower evolution 

Existing parton shower simulations are often based on virtuality ordering [16U19) or transverse momentum 
ordering [18,23,22J. The hard scale, i.e. the maximum evolution parameter for a set of colour connected 
partons, is then usually taken to be the maximum virtuality involved in the production of these partons. 
In Drell-Yan like events, for example, the hard scale is taken as the invariant mass of the final state lepton 
pair. In the case of DIS it is taken as the (negative) photon virtuality Q 2 . Using this choice it is a priori 
impossible to fill the complete available phase space with parton-shower emissions, since Q 2 tends to be close 
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to zero. Next-to-leading order calculations indicate, however, that the emission probability for additional 
partons is large, even if Q 2 is low, due to the possibly large hadronic centre-of-mass energy. 

The leading contribution in this context stems from the interaction, e^g — ► e^qq, where the "sub-process" 
7*<7 — * qq plays the role of the hardest interaction, if the transverse energy squared of the final state quarks 
in the Breit frame, Ej, B , is larger than Q 2 (cf. Fig.[0J. A similar problem was noted in [M] in the context 
of supersymmetric particle production at hadron colliders. To circumvent it, and allow a wider range of the 
phase space to be accessible for parton shower radiation, so-called "power shower" schemes were employed 
to artificially increase the starting scale of the initial-state shower. Although apparently in conflict with 
factorisation assumptions, this approach has recently received some theoretical support from the proposition 
of modified DGLAP evolution equations [55]. 

Another approach to overcome the restriction of the shower phase space by a low factorisation scale would 
be to employ an ordering parameter different from virtuality or transverse momentum; one which is more 
suited for the description of radiation off colour dipoles connected to initial-state hadrons. In fact, the proper 
framework for treating initial-state radiation is given by the CCFM equations [5j, which order branchings 
in terms of emission angles, cf. also [T7] . The CCFM scheme allows the transverse momentum of emitted 
partons to become larger than the factorisation scale and can therefore provide a generic solution of the 
above problem. It has successfully been used in several Monte-Carlo event generators [50]. We do, however, 
not resort to the CCFM technique here. The fact that a transverse momentum ordered parton shower can 
only sensibly describe parton spectra below the factorisation scale will rather be compensated by a special 
technique for merging matrix elements and truncated showers, which is introduced in Sec. 12.21 

In the context of this work we employ the parton-shower algorithm initially presented in [22 , which is 
based on the Catani-Seymour (CS) subtraction method, cf. [35]. Modifications of the original approach to 
account for recoil effects into the final state from splitting initial state partons with final state spectator 
were recently proposed [37] and are refined in [38j . It is interesting to investigate the corresponding effect 
on the parton shower predictions. Figure [Tal shows differential n-jet rates, i.e. the scale where an n-jet event 
is clustered into an n — 1-jet event, using the exclusive fcr-jet algorithm |39j . The difference between the 
predictions are sizeable, when switching between the original and the modified recoil scheme, especially in the 
low-fcr domain, 1 GeV < fcr S 10 GeV. This implies that the choice of the recoil scheme in a given parton- 
shower simulation should be part of an uncertainty analysis, much like the variation of renormalisation and 
factorisation scales. We comment on this subject in Secs. l2.2l and [3] To improve the parton-shower prediction 
in the domain of hard emissions and therefore alleviate the merging with NLO real emission matrix elements, 
the shower splitting kernels can be modified to include matrix element corrections. The corrected splitting 
kernels amount to antenna functions 40J, which were used for parton showers only in e + e~ annihilation up 
to now [41j . The corresponding procedure is outlined in Appendix El Figures llbl and [Tc] show the influence 
of these corrections on the fey-jet rates in the Breit frame. We observe a substantial change in the total rate 
of emissions. In the following, matrix element corrected splitting kernels are therefore employed. 

2.2 Merging matrix elements and parton showers 

Next-to-leading order calculations and parton-shower simulation present two essentially different approaches 
to perturbative QCD. Fixed-order calculations seek to determine all finite corrections to the leading-order 
process and are usually most important when measuring inclusive quantities. Parton-shower simulation aims 
at a proper resummation of large logarithmic corrections to the leading-order result, while preserving the 
overall cross section of the initial event sample. For the first emission this presents an approximation to 
the real NLO correction, whose quality largely depends on the underlying assumptions about the splitting 
kinematics and the recoil scheme, as outlined in Sec. 12.11 Thus, parton-shower simulations are inherently 
incapable to describe the precise correlations between more than a few final-state QCD particles properly. 
While the number of particles of interest is still low (0(1 — 6)), the corresponding problems can easily be 
corrected by employing full tree-level matrix elements instead of splitting kernels. Their computation has 
been automated in various approaches and poses no conceptional problem. The task is then reduced to finding 
an efficient and versatile algorithm for implementing the parton-shower correction in a generic way. Several 
methods attempted to solve this problem in the past [2SJ[H1[17], while two especially suitable approaches 
were suggested only recently [2S1[53]. In the context of this work, the shower- independent formulation in 
terms of truncated parton showers and an arbitrary jet criterion is employed, which was introduced in [28j . 

The basic idea is to separate the phase space into a matrix-element and a parton-shower domain through 
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Figure 2: Differential 2-jet rates denned by the exclusive fey-jet algorithm in the Breit frame for deep- 
inelastic scattering events with Q 2 > 4GeV 2 . Part (a) compares the influence of different recoil 



strategies, while parts [(b)] and (c) show the effect of matrix element corrections. Monte Carlo 
samples were generated using the parton shower model of [55]. Scheme 1 stands for the recoil 
strategy in [57][5H] , while scheme 2 labels the original strategy employed in [55] . 
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Figure 3: Schematic view of three possible core process choices in DIS three-jet produc- 
tion. Part | (a) | corresponds to the most probable core process being the virtual 
photon exchange, while additional hard partons are interpreted as parton shower 



emissions. Parts |(b)| and (c) depict configurations, where the most probable core 
process is the interaction of the virtual photon with a parton and a pure QCD 
2^2 process, respectively. 



a cut in the real-emission phase space. The matrix-element domain is then supposed to contain hard, 
well-separated partons only, while the parton-shower domain covers the region where resummation effects 
become important. Throughout the hard domain parton-shower emissions are corrected using tree- level 
matrix elements up to a given maximum multiplicity. In the soft domain, the parton shower is applied as is. 
The separation is achieved in terms of a so-called jet criterion, defining the "hardness" and/or the separation 
of a parton with respect to others [28]. This can be thought of as a kind of fey-jet measure, cf. e.g. [39] . 

As pointed out in [55], this merging algorithm needs to be refined if the scale difference between Q 2 and 
the hardness scale k\ of additional partons is large and negative. In this case, logarithmic corrections are 
not induced by Q 2 /q 2 , but rather by k\jq 2 , where q 2 is the jet resolution scale. In the case of DIS, the 
production of the virtual photon can then be viewed as an electroweak splitting process, attached to the 
core 7*j — > jj interaction, as depicted in Fig. I2bl In the extreme case of very hard jets, the core process 
does not even include the virtual photon. This is visualised in Fig. [5c] The correct choice of the core process 
is not arbitrary, but is rather fixed by the backwards clustering algorithm described in [58], cf. also [15] . 
To allow an inclusive merging procedure, the clustering algorithm must allow to identify the virtual photon 
as a soft particle, which is removed in order to find the core process and reproduced later in unfolding 
the matrix-element branching history. If QED splitting functions are included into the parton-shower, the 
correct method is obtained immediately, cf . also [5HJ . 

The above merging algorithm can also be employed to solve the problem outlined in Sec. 12.11 That is, it can 
be used to fill the complete available real emission phase space for any given Q 2 . A similar solution is in 
fact adopted in Drell-Yan lepton-pair production via 7*/Z-exchange, where the separation cut Q C ut between 
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matrix-element and parton-shower domain is set such that Q cu t < miy , with mu> being the invariant mass 
of the lepton pair. This situation is particularly simple, since an experimental cut is usually applied, which 
enforces m;;/ ~ mz- Therefore Q cu t can remain constant at Q cu t = Sdy^z, where S*dy is an in principle 
arbitrary constant with < S'dy < 1. Of course, Sby must be chosen sensibly, such as not to drive Q cu t 
into the non-perturbative domain. Also, Suy should not be too close to one, since the proper description 
of particle spectra in this region largely depends on the recoil strategy employed in the shower. In practice, 
we have 0.1 < S'dy ^$ 0.5. In deep-inelastic-scattering the situation is slightly different due to the variable 
value of Q 2 . The solution can, however, be identical. We choose 

-1/2 

(2) 

where Q cu t is a fixed value, much like Q cu t in the Drell-Yan pair production case. It ensures that high-Q 2 , 
medium- E\ B events are described by matrix elements, rather than by the parton shower. At the same 
time, the factor in the square bracket, including Sdis < 1> enforces low-Q 2 , high-E^^ events to be in the 
matrix-element domain as well, such that the complete available real-emission phase space can be filled by 
the Monte-Carlo simulation. Note that, contrary to the large freedom in the choice of Qcut, we are rather 
limited in the choice of Sdis- Most analyses of deep- inelastic scattering data employ a cut on the photon 
virtuality which is of the order of a few GeV 2 . The Monte-Carlo simulation, however, is bound to have Q cu t 
in the perturbative domain with some difference between Q 2 ut and Q 2 , as discussed above. This introduces 
rather strict limits on the available range for Suis- To be specific, 

0-4 < Sdis < 0.8 , (3) 

where the lower bound depends on the experimental setup and the upper bound depends on the parton- 
shower model. 

Figure[3]illustrates the effect of different Qcut and different Sdis on the prediction for the differential 2- and 3- 
jet rates in the Breit frame. The Monte-Carlo result remains very stable against corresponding variations. It 
can also be seen that when merging the parton shower with matrix elements, previous differences arising from 
different recoil strategies reduce considerably. This is essentially because the parton-shower contribution to 
the observable is largely reduced, such that kinematical effects from shower branchings have far less influence 
than in event samples without matrix-element merging. 
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3 Comparison with experimental data 

In this section, Monte-Carlo predictions, generated according to Sees. 12.11 and 12. 2[ are confronted with 
hadronic final state data taken by the HI experiment. The correct description of the selected measurements 
is quite challenging for the Monte Carlo traditionally used in the analysis of HERA data [44J. We seek 
to quantify the effect of varying perturbative input parameters and varying intrinsic parameters of the 
merging approach. We are mainly interested in the hard, perturbative domain, and therefore we choose 
to focus particularly on jet analyses. Monte-Carlo predictions stem from the SHERPA program [3U], which, 
in this context, employs the matrix-element generator COMIX [45! to simulate the hard processes. Parton 
showers are implemented by the dipole-like cascade presented in |22| . Hadronisation is simulated either 
using the cluster fragmentation model of SHERPA [33], or the Lund string fragmentation model [31] in the 
implementation of PYTHIA 46J. Both models were previously tuned to describe LEP data employing the 
PROFESSOR program [47] H Hadron decays are implemented by SHERPA's internal hadron decay module [45] 
or by PYTHIA [45], depending on the hadronisation model employed. Photon radiation is simulated by 
SHERPA's internal YFS generator [4S]. All analyses are carried out using the HZTool library [50] , 

If not stated otherwise, matrix elements with up to five QCD partons in the final state are employed and 
the parameters of the matrix-element parton-shower merging according to Sec. 12.21 are set to Sdis = 0.6 
and Q cut = 5 GeV. The default PDF set is NNPDF 1.2 [4] in the implementation with 100 replicas. 
The perturbative order of the strong coupling and its value at the reference scale mz is always chosen in 
accordance with the PDF set. 

2 We are indebted to Frank Krauss, Hendrik Hoeth and Eike von Seggern for making preliminary sets of tuning parameters 
available. 
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Figure 4: The differential 2- and 3-jet rates in merged event samples of varying Qcut (a) varying S'dis |(b)| 



and varying shower recoil strategy (c) See also Fig. [JJ for notation. Coloured lines display the 
contributions of different final state multiplicity matrix elements. The central parameter value 
is chosen as Q C ut = 5 GeV and S'dis — 0.6. The maximum parton multiplicity in hard matrix 
elements is N max = 3. 



3.1 Inclusive jet analysis 

Following the arguments of Sec. 12.11 and 12.21 a crucial observable is given by the inclusive jet cross section, 
differential with respect to E TB /Q 2 , where Et,b is the jet transverse energy in the Breit frame. For 
Etb/Q 2 > 1 it probes a part of the phase space where leading order Monte-Carlo models without the 
inclusion of low- a; effects are bound to fail in their description of jet spectra. The question to be answered 
here is, whether the incorporation of higher order tree-level matrix elements is sufficient to improve on this 
deficiency and yield predictions which are consistent with experimental data. Inclusive E\ B /Q 2 -spectra were 
measured for different ranges of the jet-pseudorapidity in the laboratory frame, r/i a b, in the low-Q 2 domain 
5 < Q 2 < 100 GeV 2 by the HI collaboration [ST]. In this analysis jets were defined using the inclusive 
fcT-algorithm [S5] and were constrained to Et.b > 5 GeV and the pseudorapidity range —1 < rn a b < 2.8. It 
was found that next-to-leading order QCD calculations can describe the data reasonably well, while large 
differences between leading-order and next-to-leading order results have been observed, especially in the 
forward region 1.5 < rjiab < 2.8. 

Results from our analysis are compared to the HI data in Figs. [SJ to [71 Figure [Sal shows that the Monte-Carlo 
prediction gradually improves with a growing number of final-state partons in the hard matrix elements. This 
behaviour is expected. Including matrix elements of larger final-state multiplicity corresponds to opening 
the full phase space for high-Ex, b jet production. If all possible channels are to be incorporated, matrix 
elements with at least three final state partons must be available (3-parton sample). Indeed we observe that 
if only matrix elements with up to two final-state partons are considered (2-parton sample), the Monte- 
Carlo prediction is far off the data. While the 3-parton sample gives an improved description, the data are 
described satisfactorily only by a 4-parton sample. Including one additional emission in the matrix elements 
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does not alter the results too much. This can be understood as an effect of the definition of the observable. 
Since inclusive jet spectra are investigated, the proper description of the kinematics of sub-leading jets can 
be as important as the simulation of the leading jet. 

Theoretical uncertainties of the Monte-Carlo prediction are shown in Figs. [5b] to 6b. The variation of 
renormalisation and factorisation scales has the largest impact on our results. This is expected, since an 
improved leading order approach is employed, which does not allow to predict the total cross section correctly. 
Higher-order virtual corrections are missing in this algorithm. However, the systematic inclusion of higher- 
order real corrections allows the prediction of the jet spectra in arbitrary multi-jet topologies at leading 
order, a feature which is inherently not present in any fixed-order calculation. The uncertainties associated 
with a variation of the intrinsic parameters of the merging algorithm are small compared to the variation 
found when altering the renormalisation and factorisation scale. This is exemplified in Fig. I5bl 

Figure [7a] shows that the uncertainty related to the parton shower recoil strategy is negligible. This is a 
direct consequence of the merging approach and has been discussed in Sec. 12.21 cf. Fig. [3] We also compare 
the influence of different PDF sets on the results of our analysis. All those PDFs are based on next-to-leading 
order fits. Figure [7b] shows that the corresponding results are essentially compatible, while a slight preference 
can currently be given to the NNPDF 1.2 set, cf. also Fig. I9bl 

3.2 Inclusive jet and di-jet analysis 

It is interesting to investigate jet properties in some more detail. The analysis presented in (53| covers a wider 
range of Q 2 and presents jet-Ex. b spectra doubly differential in Q 2 and Et.b- Jet pseudorapidities (r/iab) 
and pseudorapidity differences (rf = \t]b,i — flB,2\ /2) were also analysed. Next-to-leading order calculations 
turn out to be particularly important for the latter, with deviations from leading-order predictions being 
most pronounced in the region of large rjia^ of the forward jet. The acceptance region of this measurement is 
5 < Q 2 < 15000 GeV 2 and — 1 < r)i a b < 2.5. Jet transverse energies are subject to the cuts Et,b 1,2 > 5 GeV 
and Et.b 1 + Et.b 2 > 17 GeV. The latter requirement is introduced to avoid Et.bi ~ Et,b2, which is the 
region of the phase space where next-to-leading order corrections are unstable due to implicit restrictions on 
soft emissions |54j . 

A good probe of the proper Monte-Carlo simulation of such events is the di-jet cross section shown in Fig. 6 
of [53] . While still a relatively inclusive quantity compared to the double differential spectra in the rest of this 
analysis, it tests the proper behaviour of jet production with decreasing Q 2 and is therefore as important 
as the jet-E T B /Q 2 spectra shown previously. It can be seen in Fig. [5a] that a large number of partons 
simulated with hard matrix elements is needed to describe this observable properly. Once a good description 
is obtained, however, we are also capable of predicting the double differential jet spectra in Et,b and 7/, cf. 
Figs. [TOl andfTTl The agreement is excellent over the complete Q 2 -range of the measurement, which implies 
that the merging approach is well capable to describe the dynamics of multi-jet final states, if the maximum 
multiplicity in hard matrix elements is large enough. We show theoretical uncertainties associated with our 
predictions of the Q 2 - s P ec trum in Figs. [8b] to 8d. The same comments as for the previous section apply. It 
can also be seen in Fig. [He] that a variation of renormalisation and factorisation scales does not only result 
in a global JT-factor, i.e. a redefinition of the total cross section. Varying these scales can instead induce a 
distortion of jet- and particle spectra and it is therefore important to assess the related uncertainties. 

3.3 Low-x di-jet analysis 

In DIS di-jet events, next-to-leading order corrections are especially large when the two jets in the Brcit 
frame have similar transverse energy |54j . It is thus interesting to study an observable, which singles out the 
corresponding region of the phase space. A dedicated measurement which defines such an observable was 
carried out for the low-Q 2 , low- a; domain by the HI collaboration [55] . Jets were defined using the inclusive 
fcT-algorithm [52] and were constrained to Et.b > 5 GeV and the pseudorapidity range — 1 < rji a b < 2.5. 
Deep-inelastic scattering events were selected in the kinematic range 5 GeV 2 < Q 2 < 100 GeV 2 and 10~ 4 < 
x < 10~ 2 . A variable A was defined by the requirement E T max > E Tcut + A, where E Tcut is the minimum 
jet transverse energy and E Tlnax is the transverse energy of the hardest jet. All quantities marked with 
an asterisk are given in the hadronic centre-of-mass frame, which is related to the Breit frame simply by a 
longitudinal boost [55J. An observable Ar/* was defined as the pseudorapidity difference between the two 
hardest jets in the event for a fixed value of A = 2 GeV. 
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The results of our Monte-Carlo analysis are compared to the HI data in Figs. [T2"l and [TBI For the A spectra 
we present parton level predictions and hadron level results. It can be seen that the effect of hadronisation 
on this observable is rather large in the region of very low x. The fact that hadronisation corrections are 
not uniform over the phase space indicates the importance of multi-purpose Monte-Carlo event generators. 
Full hadron-level events can easily be simulated by such programs. Also, the effect of different hadronisation 
models can be studied. We observe that the overall description of the data improves once hadronisation 
corrections are included. The fragmentation model employed here is the cluster fragmentation of }33j . 

3.4 Three-jet analysis 

The three-jet analysis presented in [56] allows to test perturbative QCD predictions for events including 
one additional hard jet. Two angles, #3 and ipa, were introduced in |57j . which, together with the scaled 
energies of the jets, i.e. the jet energies w.r.t. the invariant mass of the three-jet system, can be used to 
parametrise the phase space of three-jet events. At the same time they exhibit some sensitivity to the 
correct simulation of the QCD dynamics, cf. [57 , 56J . They are defined in the three-jet centre-of-mass frame, 
with 6*3 being the angle of the most energetic jet w.r.t. the proton beam direction and ^3 the angle between 
the plane containing the most energetic jet and the proton beam and the plane containing all three jets. 
The inclusive fey-algorithm [52] was employed in [56) to define jets, which were then constrained to the 
region Et,b > 5 GeV and —1 < iji a b < 2.5. Due to the construction of the HI detector, the phase-space 
of the measurement is slightly different in the low-Q 2 (5 GeV 2 < Q 2 < 100 GeV 2 ) and in the high-Q 2 
(150 GeV 2 < Q 2 < 5000 GeV 2 ) analysis. Details can be found in the original publication. 

Figure [T3] compares the results of our Monte-Carlo analysis with data. The distribution of the angles 6*3 
and "03 is relatively well described by the simulation. We also show the Q 2 -distribution of the three-jet 
events, where the comments of Sec. I3.2l applv. We observe that the Q 2 -spectrum is matched very well by the 
Monte-Carlo prediction, which again indicates the relevance of including high-multiplicity matrix elements 
into the simulation. A particularly useful observable to test the correct description of multi-jet rates, which 
is not available in di-jet events, is the ratio of the three- over the two-jet cross section, -R32. This quantity 
is independent of the overall normalisation of the event sample and is therefore especially suited to validate 
the Monte-Carlo models employed by leading-order event generators. We find satisfactory agreement with 
the corresponding data over the complete observed Q 2 -range. 

3.5 Jet-shape analysis 

The analysis presented by the HI collaboration in [55] investigates shapes and sub-jet rates of jets defined 
using either the inclusive fcr-algorithm [55] or a cone algorithm [5S]. While the jet shape Vt'(r) receives 
sizeable contributions from non-perturbative effects over the whole radial range of the jet, the sub-jet rate 
becomes fairly independent of non-perturbative dynamics at large values of the resolution parameter [58j . It 
is therefore a useful observable for measuring the perturbative dynamics of jet final states and can be used 
in particular to validate our Monte-Carlo simulation of parton evolution. 

We exemplify in Figs. [T5] and \W\ that these observables are described satisfactorily by our Monte-Carlo 
approach. In fact, this can be expected once jet rates and event shapes are fitted in e + e~ experiments. 
In this respect, we present a simple but necessary cross-check on the universality of the parton shower 
and hadronisation algorithms, which tests the nontrivial extension from pure final-state parton evolution to 
combined final- and initial-state evolution. 

3.6 Energy-flow analysis 

Energy flows are crucial observables to determine the properties of QCD final states in the region where 
perturbative and non-perturbative effects are equally important. It has been pointed out 60J that fragmen- 
tation might have as big an impact on these observables as has the perturbative input from hard processes 
and parton showering. This statement is supported by the observation that a large part of the spectra can 
be tuned to described the data by varying fragmentation parameters only. However, the influence of the 
perturbative input, i.e. the distribution of partons in the phase space and their colour correlations after the 
termination of the parton cascade, cannot be neglected. Transverse energy flows thus constitute an ideal 
observable to test the interplay between the hard, perturbative event phase and the hadronisation phase 
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in Monte Carlo programs. The analysis presented by the HI collaboration in J5JJ extended previous mea- 
surements to a larger 77-range and higher Q 2 , where the usage of a forward calorimeter (PLUG) allowed the 
determination of data points at very low 77*, the particle rapidity in the hadronic centre-of-mass frame. As 
pointed out in [61] , the analysis of the transverse energy flow in this frame of reference isolates the physically 
interesting part of the distribution. 

Figures [T7] and [Tg] compare our Monte Carlo predictions to the HI Data. We find very good agreement 
when employing the cluster fragmentation model of 33J, while the Lund string fragmentation [3TJ gives 
predictions, which are slightly off the data. It should be noted, that a set of parameters can be found, with 
which the string fragmentation model gives better results for this particular observable. The fact that these 
parameters do not match those for which the model has been tuned to LEP data indicates the importance 
of a combined analysis when tuning intrinsic parameters of Monte-Carlo event generators. 

3.7 Charged particle spectra analysis 

Due to the large dependence of the transverse energy flows on hadronisation effects, transverse momentum 
spectra and pseudorapidity spectra of charged particles have been measured additionally by the HI collabo- 
ration in [52] • It was argued in [53] that these observables provide a more direct measure of parton dynamics 
through a strong correlation between partons and final-state particles and might therefore be crucial to 
distinguish between DGLAP- and BFKL-driven parton evolution. The influence of hadronisation should be 
more pronounced in the low-px region, while the high-py tail of the distributions is more sensitive to per- 
turbative effects. Significant discrepancies have been observed in the high-p-r domain between the data and 
predictions from DGLAP-based Monte-Carlo models. Deviations also occur in the particle flow for pt > 1 
GeV tracks. 

We observe similar effects when comparing our Monte-Carlo results with the HI data. Even though up to 
five-parton final states are included in our Monte-Carlo simulation, Fig. [19] shows discrepancies especially 
in the high-px region. The particle flow for tracks with transverse momentum larger than 1 GeV, shown in 
Fig. [20] projects onto the critical part of the phase space. We show results from two different Monte-Carlo 
setups, labeled "Set 1" and "Set 2" . While "Set 1" was produced using the cluster hadronisation model in 
combination with the NNPDF 1.2 PDF set "Set 2" displays predictions from the Lund string hadronisation in 
combination with the CTEQ 6L1 PDF set. We observed that the particle flow can be described satisfactorily 
by "Set 2" . The corresponding results for other observables are, however, not matching the data. Using this 
parameterisation, for instance, transverse energy flows can not be described satisfactorily. Hence, at present, 
there is no agreement with data for these observables. This finding highlights the importance of including 
HERA data into the global tuning of hadronisation parameters. 

3.8 Charged multiplicity analysis 

Multiplicity distributions are one of the basic observables in hadronic final states. Much like the transverse 
energy flows, they allow a validation of the interplay between perturbative and nonperturbative parts of a 
Monte-Carlo simulation of detector events. The evolution of charged particle multiplicities with the hadronic 
centre-of-mass energy, W, and their dependence on the allowed pseudorapidity range was studied by the HI 
collaboration in [64] . 

We present a comparison between our Monte-Carlo results and the data in Fig. [2T] Good agreement over 
the complete W and 77 range is observed. 

4 Conclusions 

In this publication, we have extended the SHERPA event-generation framework to describe hadronic final 
states in deep-inelastic lepton-nucleon scattering processes. SHERPA is a modern event generator, which 
implements the merging of matrix-element based event generation with a parton shower. 

The merging procedure relies on a backward clustering algorithm according to an inverted parton shower, 
which determines a hard core process at the origin of the matrix element or the parton shower. For a 
fully inclusive matrix-element parton-shower merging, the clustering must be performed on all outgoing 
particles. Depending on the final state kinematics, characterised by the photon virtuality Q 2 and the 
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transverse momenta of final state jets, the core process in deep-inelastic scattering is then either found as 
electron-quark scattering, photon-quark scattering or a partonic 2 — > 2 scattering process. 

To account for the kinematical situation in deep-inelastic scattering, the merging procedure in SHERPA was 
refined, taking into account that most hadronic final states are characterised by several hard scales, not only 
by Q 2 . By choosing appropriate merging scales, a successful description of processes in all kinematical regions 
(including the low- a: region, and including high--E§, B , low-Q 2 processes) could be obtained in a theoretically 
consistent manner, consistent with factorisation. To reduce the merging uncertainty, modifications to the 
parton shower kernels were made. Finite terms were added to the previously used dipole kernel to ensure 
that the kernels amount to the full matrix elements associated to the splitting process (like in antenna-based 
showers). With these refinements, SHERPA is the first multi-purpose event generator program for deep- 
inelastic processes which incorporates a full merging of leading-order matrix elements with parton showers. 

We validated our results on a multitude of HERA data on hadronic final states in deep-inelastic scattering, 
including jet cross sections, jet-transition rates and hadronic particle spectra. All observables considered 
are described in a very satisfactory manner. We quantified the uncertainties due to scale choices, merging 
parameters, parton-shower schemes, parton distribution functions and hadronisation models. 

The comparison with HERA data provides an important validation of the SHERPA initial- and final-state 
parton-shower schemes in non-trivial kinematical situations hardly accessible in other collider environments. 
It has important consequences for LHC studies in similar kinematical situations (like low-mass Drell-Yan or 
vector-boson plus multi-jet production). 

Using the HERA data set on different aspects of hadronic final states will allow for a validation and tuning 
of hadronisation models, which was based up to now purely on data from e + e _ annihilation. Inclusion of 
DIS data probes different flavour combinations, and will help to resolve parameter degeneracies, thereby 
leading to important improvements of the hadronisation models. 
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A Matrix element correction of the splitting kernels 

The original parton shower algorithm |22j . based on Catani-Seymour dipoles, can be modified easily to 
improve the radiation pattern in deep-inelastic scattering. The key idea is to add some nonsingular bits 
to the original spin-averaged dipole functions, such that combining the radiation functions of emitter and 
spectator yields the exact NLO real radiation matrix element. This correction does not spoil the logarithmic 
accuracy of the parton shower. However, merging the shower with higher-order tree-level matrix elements 
along the lines of Sec. l2.2l is then alleviated for the first emission, because the radiation patterns are formally 
identical!! We focus on massless partons. 

The situation in final-state parton splittings with initial-state spectator is sketched in Fig. I4b[ We employ 
the variables 

PiPa , 1 PiPj 



(Pi+Pj)Pa ' {Pi+P])Pa 

The corresponding spin-averaged splitting functions are given in [22]. While (V% } ) is left unchanged, we 



3 In fact the identity of radiation patterns largely depends on the recoil scheme of the parton shower, cf. |65| . In this context, 
we only refer to the processes ■y*g — > qq and 7*9 — * qg, averaged over the virtual photon spin. 
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a Pa 



a p a 



(a) (b) 

Figure 5: Schematic view of the splittings of an initial-state parton with a final- 
state spectator and the splitting of a final-state parton with an initial- 
state spectator. The blob denotes the hard matrix element. Incoming 
and outgoing lines label initial- and final-state partons, respectively. 



redefine, in the context of this work 

2 



(V- w )(z i ,a:y,„)=r Jl j (l - 2 * (1 - *)) h- ^^^ j + C^(^, %> )| . 
These functions differ from the original evolution kernels by the additional nonsingular factors 

As required, the corrections vanish in the soft and collinear limits and the original evolution kernels remain. 
An initial-state parton splitting with final-state spectator is sketched in Fig. 2a| We employ the variables 

PiPa , . PiPk ,„s 

and Xik, a = 1 - 7 ; r — • (7) 



(Pi +Pk)Pa ' ' ' {Pi +Pk)Pa 

The corresponding spin- averaged splitting functions (V) are presented in (55]. While (V^" 9 *) and (V| q9 *) 
are left unchanged, we redefine, in the context of this work 

{V 9 k 9i )(xik, a ,U i )=C F ( — 2 (1 + + «fc)} . 

(V 9 k aqi )(x ik , a , Ui )=T R j(l-2a: tt , a (l -*«,„)) ^ - ^"^'^'"^ + C££, (*«,«.«*)} . 

Using the above modifications, it can be shown that the combination of appropriate splitting kernels indeed 
reproduces the complete real-emission matrix elements for the processes 7*17 — * qq and 7*g — ► qg. 
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Figure 6: The inclusive jet cross section as a function of E\ B /Q 2 in bins of r)i a b, measured by the HI 
Collaboration [51J . E\ B is the jet transverse energy in the Breit frame, while ijiab denotes 



the jet rapidity in the laboratory frame. Part (a) displays the influence of the maximum parton 



multiplicity, -/V max , from hard matrix elements. We show the uncertainty originating from varying 
•Sdis between 0.5 and 0.7 (light grey band) and from varying Q cu t between 3 GeV and 9 GeV 
(dark grey band) in part |(b)| 
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Figure 9: The di-jet cross section as a function of Q 2 in bins of Et,i + Et,2, measured by the HI Collabo- 
ration [53] ■ See Figs. [5] and [6] for an explanation of parts (a) through |(d)| 
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Figure 12: The di-jet cross section as a function of v[ , measured by the HI Collaboration [53]. rf denotes 
half the rapidity difference of the two leading jets in the Breit frame. 
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Figure 13: The differential di-jet cross section as a function of A in bins of mean x and Q 2 , measured by 
the HI Collaboration [SS]. A is defined as -E^max > -^Tcut + ^> where Ei^ cut is the minimum 
jet transverse energy and £j mal is the transverse energy of the hardest jet. 
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Figure 14: The di-jet differential cross section for A = 2 GeV as a function of \Ar/*\ in bins of x and 
measured by the HI Collaboration [55] . 
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Figure 15: The three-jet cross section as a function of Q |(a)| cos #3 (c) and -03 |(d)| and the ratio of the 



three- over the two-jet rate as a function of Q |(b)[ measured by the HI Collaboration 
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Figure 16: Jet shapes in bins of jet transverse energy and jet pseudorapidity in the Breit frame, measured 
by the HI Collaboration [55]. 
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Figure 17: Sub-jet rates as a function of the resolution parameter y cut in bins of jet transverse energy and 
jet pseudorapidity in the Breit frame, measured by the HI Collaboration |58j . 
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Figure 18: Transverse energy flows measured by the HI Collaboration [51]. The histogram labeled "Clus- 
ter" displays results obtained with the cluster hadronisation model of [33] > while "String" shows 
predictions of the Lund string hadronisation 3 lj . 
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ure 19: Transverse energy flows measured by the HI Collaboration [51]. See Fig. [T7] for notation. 
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Figure 20: Charged particle transverse momentum spectra measured by the HI Collaboration [B2]. The 
histogram labeled "Set 1" shows predictions from the cluster hadronisation model of [33] in 
combination with the NNPDF 1.2 PDF set while "Set 2" displays predictions from the 
Lund string hadronisation [31] in combination with the CTEQ 6L1 PDF set |67j . 
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Figure 21: Charged multiplicity flow for single-particle transverse momenta larger than 1 GeV, measured 
by the HI Collaboration [52]. See Fig. [TO] for notation. 
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Figure 22: Charged multiplicity distributions in bins of ij and W, measured by the HI Collaboration 
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